AITopics | parallel execution

Collaborating Authors

parallel execution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing

Sun, Zekai, Guan, Xiuxian, Lin, Zheng, Fang, Zihan, Cai, Xiangming, Chen, Zhe, Liu, Fangming, Cui, Heming, Xiong, Jie, Ni, Wei, Yuen, Chau

arXiv.org Artificial IntelligenceSep-24-2025

Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life. While Mobile Edge Computing (MEC) offers collaborative inference with GPU servers as a promising solution, existing approaches primarily rely on layer-wise model partitioning and undergo significant transmission bottlenecks caused by the sequential execution of DNN operations. To address this challenge, we present Intra-DP, a high-performance collaborative inference system optimized for DNN inference on MEC. Intra DP employs a novel parallel computing technique based on local operators (i.e., operators whose minimum unit input is not the entire input tensor, such as the convolution kernel). By decomposing their computations (operations) into several independent sub-operations and overlapping the computation and transmission of different sub-operations through parallel execution, Intra-DP mitigates transmission bottlenecks in MEC, achieving fast and energy-efficient inference. The evaluation demonstrates that Intra-DP reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines, without sacrificing accuracy.

artificial intelligence, cloud computing, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.05829

Country: Asia > China (0.93)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Energy (0.67)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Mobile (1.00)
(5 more...)

Add feedback

GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination

Taleb, Nabil Abdelaziz Ferhat, Rezaei, Abdolazim, Patel, Raj Atulkumar, Sookhak, Mehdi

arXiv.org Artificial IntelligenceJul-21-2025

--Large Language Models (LLMs) offer significant promise for intelligent traffic management; however, current chain-based systems like TrafficGPT are hindered by sequential task execution, high token usage, and poor scalability, making them inefficient for complex, real-world scenarios. T o address these limitations, we propose GraphTrafficGPT, a novel graph-based architecture, which fundamentally redesigns the task coordination process for LLM-driven traffic applications. Graph-TrafficGPT represents tasks and their dependencies as nodes and edges in a directed graph, enabling efficient parallel execution and dynamic resource allocation. The main idea behind the proposed model is a Brain Agent that decomposes user queries, constructs optimized dependency graphs, and coordinates a network of specialized agents for data retrieval, analysis, visualization, and simulation. By introducing advanced context-aware token management and supporting concurrent multi-query processing, the proposed architecture handles interdependent tasks typical of modern urban mobility environments. Experimental results demonstrate that GraphTrafficGPT reduces token consumption by 50.2% and average response latency by 19.0% compared to TrafficGPT, while supporting simultaneous multi-query execution with up to 23.0% improvement in efficiency. Large Language Models (LLMs) have changed artificial intelligence capabilities across domains by enabling natural language understanding and generation at new levels. The recent models, such as GPT -4, Claude, and Llama, can comprehend complex instructions, reason through problems, and generate coherent responses across diverse applications [1].

graphtrafficgpt, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.13511

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.89)

Industry: Transportation > Infrastructure & Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel

Cutler, Dylan, Kandoor, Arun, Dikkala, Nishanth, Saunshi, Nikunj, Wang, Xin, Panigrahy, Rina

arXiv.org Artificial IntelligenceJan-26-2025

Standard decoding in a Transformer based language model is inherently sequential as we wait for a token's embedding to pass through all the layers in the network before starting the generation of the next token. In this work, we propose a new architecture StagFormer (Staggered Transformer), which staggered execution along the time axis and thereby enables parallelizing the decoding process along the depth of the model. We achieve this by breaking the dependency of the token representation at time step $i$ in layer $l$ upon the representations of tokens until time step $i$ from layer $l-1$. Instead, we stagger the execution and only allow a dependency on token representations until time step $i-1$. The later sections of the Transformer still get access to the ``rich" representations from the prior section but only from those token positions which are one time step behind. StagFormer allows for different sections of the model to be executed in parallel yielding at potential 33\% speedup in decoding while being quality neutral in our simulations. We also explore many natural variants of this idea. We present how weight-sharing across the different sections being staggered can be more practical in settings with limited memory. We show how one can approximate a recurrent model during inference using such weight-sharing. We explore the efficacy of using a bounded window attention to pass information from one section to another which helps drive further latency gains for some applications. We also explore demonstrate the scalability of the staggering idea over more than 2 sections of the Transformer.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.15665

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models

Wang, Yongdong, Xiao, Runze, Kasahara, Jun Younes Louhi, Yajima, Ryosuke, Nagatani, Keiji, Yamashita, Atsushi, Asama, Hajime

arXiv.org Artificial IntelligenceNov-13-2024

Large Language Models (LLMs) have demonstrated significant reasoning capabilities in robotic systems. However, their deployment in multi-robot systems remains fragmented and struggles to handle complex task dependencies and parallel execution. This study introduces the DART-LLM (Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models) system, designed to address these challenges. DART-LLM utilizes LLMs to parse natural language instructions, decomposing them into multiple subtasks with dependencies to establish complex task sequences, thereby enhancing efficient coordination and parallel execution in multi-robot systems. The system includes the QA LLM module, Breakdown Function modules, Actuation module, and a Vision-Language Model (VLM)-based object detection module, enabling task decomposition and execution from natural language instructions to robotic actions. Experimental results demonstrate that DART-LLM excels in handling long-horizon tasks and collaborative tasks with complex dependencies. Even when using smaller models like Llama 3.1 8B, the system achieves good performance, highlighting DART-LLM's robustness in terms of model size. Please refer to the project website \url{https://wyd0817.github.io/project-dart-llm/} for videos and code.

dart-llm, dependency, robot, (11 more...)

arXiv.org Artificial Intelligence

2411.09022

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.06)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration

McDanel, Bradley

arXiv.org Artificial IntelligenceOct-22-2024

Large language models typically generate tokens autoregressively, using each token as input for the next. Recent work on Speculative Decoding has sought to accelerate this process by employing a smaller, faster draft model to more quickly generate candidate tokens. These candidates are then verified in parallel by the larger (original) verify model, resulting in overall speedup compared to using the larger model by itself in an autoregressive fashion. In this work, we introduce AMUSD (Asynchronous Multi-device Speculative Decoding), a system that further accelerates generation by decoupling the draft and verify phases into a continuous, asynchronous approach. Unlike conventional speculative decoding, where only one model (draft or verify) performs token generation at a time, AMUSD enables both models to perform predictions independently on separate devices (e.g., GPUs). We evaluate our approach over multiple datasets and show that AMUSD achieves an average 29% improvement over speculative decoding and up to 1.96$\times$ speedup over conventional autoregressive decoding, while achieving identical output quality. Our system is open-source and available at https://github.com/BradMcDanel/AMUSD/.

artificial intelligence, large language model, natural language, (12 more...)

arXiv.org Artificial Intelligence

2410.17375

Country: North America > United States > Pennsylvania > Lancaster County > Lancaster (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Data Classification With Multiprocessing

Dixit, Anuja, Byreddy, Shreya, Song, Guanqun, Zhu, Ting

arXiv.org Artificial IntelligenceDec-22-2023

Classification is one of the most important tasks in Machine Learning (ML) and with recent advancements in artificial intelligence (AI) it is important to find efficient ways to implement it. Generally, the choice of classification algorithm depends on the data it is dealing with, and accuracy of the algorithm depends on the hyperparameters it is tuned with. One way is to check the accuracy of the algorithms by executing it with different hyperparameters serially and then selecting the parameters that give the highest accuracy to predict the final output. This paper proposes another way where the algorithm is parallelly trained with different hyperparameters to reduce the execution time. In the end, results from all the trained variations of the algorithms are ensembled to exploit the parallelism and improve the accuracy of prediction. Python multiprocessing is used to test this hypothesis with different classification algorithms such as K-Nearest Neighbors (KNN), Support Vector Machines (SVM), random forest and decision tree and reviews factors affecting parallelism. Ensembled output considers the predictions from all processes and final class is the one predicted by maximum number of processes. Doing this increases the reliability of predictions. We conclude that ensembling improves accuracy and multiprocessing reduces execution time for selected algorithms.

algorithm, classifier, execution, (16 more...)

arXiv.org Artificial Intelligence

2312.15152

Country:

North America > United States > Ohio (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.82)

Industry:

Energy (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Adaptive parallelization of multi-agent simulations with localized dynamics

Băbeanu, Alexandru-Ionuţ, Filatova, Tatiana, Kwakkel, Jan H., Yorke-Smith, Neil

arXiv.org Artificial IntelligenceApr-4-2023

Agent-based modelling constitutes a versatile approach to representing and simulating complex systems. Studying large-scale systems is challenging because of the computational time required for the simulation runs: scaling is at least linear in system size (number of agents). Given the inherently modular nature of MABSs, parallel computing is a natural approach to overcoming this challenge. However, because of the shared information and communication between agents, parellelization is not simple. We present a protocol for shared-memory, parallel execution of MABSs. This approach is useful for models that can be formulated in terms of sequential computations, and that involve updates that are localized, in the sense of involving small numbers of agents. The protocol has a bottom-up and asynchronous nature, allowing it to deal with heterogeneous computation in an adaptive, yet graceful manner. We illustrate the potential performance gains on exemplar cultural dynamics and disease spreading MABSs.

agent, artificial intelligence, simulation, (17 more...)

arXiv.org Artificial Intelligence

2304.01724

Country:

Europe > Netherlands > South Holland > Delft (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)

Add feedback

Automated Control and Simulation of Dynamic Robot Teams in the Domain of CFK Production

Glück, Roland, Körber, Marian

arXiv.org Artificial IntelligenceOct-20-2022

This paper is concerned with the automation and simulation of pick and place processes in the domain of CFK aircraft production. We introduce a workflow which starts from a CAD construction, extracts relevant data out of it, assigns grippers to the CFK pieces and schedules the single steps using a PDDL solver. Finally, the result is visualized in Blender where also prior mistakes can be identified.

artificial intelligence, planning & scheduling, ply, (16 more...)

arXiv.org Artificial Intelligence

2210.11213

Country: Europe > Portugal > Porto > Porto (0.04)

Genre:

Research Report (0.50)
Workflow (0.49)

Industry: Leisure & Entertainment > Games (0.51)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Plan Reordering and Parallel Execution — A Parameterized Complexity View

Aghighi, Meysam (Linköping University) | Bäckström, Christer (Linköping University)

AAAI ConferencesFeb-14-2017

Bäckström has previously studied a number of optimization problems for partial-order plans, like finding a minimum deordering (MCD) or reordering (MCR), and finding the minimum parallel execution length (PPL), which are all NP-complete. We revisit these problems, but applying parameterized complexity analysis rather than standard complexity analysis. We consider various parameters, including both the original and desired size of the plan order, as well as its width and height. Our findings include that MCD and MCR are W[2]-hard and in W[P] when parameterized with the desired order size, and MCD is fixed-parameter tractable (fpt) when parameterized with the original order size. Problem PPL is fpt if parameterized with the size of the non-concurrency relation, but para-NP-hard in most other cases. We also consider this problem when the number (k) of agents, or processors, is restricted, finding that this number is a crucial parameter; this problem is fixed-parameter tractable with the order size, the parallel execution length and k as parameter, but para-NP-hard without k as parameter.

artificial intelligence, execution, planning & scheduling, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.68)
Europe > Spain > Catalonia (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Computational Aspects of Reordering Plans

Backstrom, C.

arXiv.org Artificial IntelligenceMay-26-2011

This article studies the problem of modifying the action ordering of a plan in order to optimise the plan according to various criteria. One of these criteria is to make a plan less constrained and the other is to minimize its parallel execution time. Three candidate definitions are proposed for the first of these criteria, constituting a sequence of increasing optimality guarantees. Two of these are based on deordering plans, which means that ordering relations may only be removed, not added, while the third one uses reordering, where arbitrary modifications to the ordering are allowed. It is shown that only the weakest one of the three criteria is tractable to achieve, the other two being NP-hard and even difficult to approximate. Similarly, optimising the parallel execution time of a plan is studied both for deordering and reordering of plans. In the general case, both of these computations are NP-hard. However, it is shown that optimal deorderings can be computed in polynomial time for a class of planning languages based on the notions of producers, consumers and threats, which includes most of the commonly used planning languages. Computing optimal reorderings can potentially lead to even faster parallel executions, but this problem remains NP-hard and difficult to approximate even under quite severe restrictions.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.477

1105.5441

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Sweden (0.04)
Africa > Sudan (0.04)
(15 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback